Boosting Generic Visual-Linguistic Representation with Dynamic Contexts

نویسندگان

چکیده

Pretraining large models on generous multi-modal corpora has accelerated the development of visual-linguistic (VL) representation and achieved great success various vision-and-language downstream tasks. Learning these is usually executed by predicting randomly masked words captions or patches in images. Such approaches, nevertheless, seldom explore supervision causalities behind caption descriptions procedure generating events beyond still In this work, we endow pretrained with high-level cognition delving into dynamic contexts to model visual linguistic uniformly. Specifically, format dynamic contexts an image as sentences describing xmlns:xlink="http://www.w3.org/1999/xlink">before , xmlns:xlink="http://www.w3.org/1999/xlink">on xmlns:xlink="http://www.w3.org/1999/xlink">after image. Unlike traditional caption-wise similarity, propose a novel contexts-based similarity (DCS) metric, which correlation potential causes effects besides immediate content are considered measure relevance among DCS can be further simplified parameterizing event continuity relax requirements dense contextual annotations. A new pre-task designed minimize feature distances dynamically relevant images incorporate causality commonsense knowledge VL learning. Models based our significantly outperform typical multiple cross-modal tasks, including conventional reasoning (VCR), question answering (VQA), zero-shot image-text retrieval, extended / ordering

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Generic Object Recognition with Strangeness and Boosting

People can quickly recognize enormous number of rigid/non-rigid object, such as cars, faces, trees, regardless of the viewpoint, lighting, illumination and local deformation. How to recognize generic object has been a hard problem for long time in psychophysics, neurobiology and computation. Based on the research of psychophysics and neurobiology, human interprets the image scene (label class) ...

متن کامل

Fighting biases with dynamic boosting

While gradient boosting algorithms are the workhorse of modern industrial machine learning and data science, all current implementations are susceptible to a nontrivial but damaging form of label leakage. It results in a systematic bias in pointwise gradient estimates that lead to reduced accuracy. This paper formally analyzes the issue and presents solutions that produce unbiased pointwise gra...

متن کامل

Visual Representation of 3D Language Constructs Specified by Generic Depictions

Several modeling domains make use of three-dimensional representations, e.g., the “ball-and-stick” models of molecules. Our generator framework DEViL3D supports the design and implementation of visual 3D languages for such modeling purposes. The front-end of a language implementation generated by DEViL3D is a dedicated 3D graphical structure editor, which is used to construct programs in that d...

متن کامل

Mining linguistic tone patterns with symbolic representation

This paper conceptualizes speech prosody data mining and its potential application in data-driven phonology/phonetics research. We first conceptualize Speech Prosody Mining (SPM) in a time-series data mining framework. Specifically, we propose using efficient symbolic representations for speech prosody time-series similarity computation. We experiment with both symbolic and numeric representati...

متن کامل

Leveraging k-NN for generic classification boosting

Voting rules relying on k-nearest neighbors (k-NN) are an effective tool in countless many machine learning techniques. Thanks to its simplicity, k-NN classification is very attractive to practitioners, as it enables very good performances in several practical applications. However, it suffers from various drawbacks, like sensitivity to “noisy” instances and poor generalization properties when ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Transactions on Multimedia

سال: 2023

ISSN: ['1520-9210', '1941-0077']

DOI: https://doi.org/10.1109/tmm.2023.3237164